In [1]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
X = np.random.rand(100)
y = X + 0.1 * np.random.randn(100)
In [3]:
plt.scatter(X, y);
plt.show()
Following the steps prescribed by Jake Vanderplas in his awesome text Python Data Science Handbook. He has kindly provided all his codes on github as well.
In [4]:
from sklearn.linear_model import LinearRegression
In [5]:
model = LinearRegression(fit_intercept=True)
In [6]:
X = X.reshape(-1, 1)
In [7]:
X.shape
Out[7]:
In [8]:
model.fit(X, y)
Out[8]:
In [9]:
model.coef_
Out[9]:
In [10]:
model.intercept_
Out[10]:
If you are statistically trained, you would normally dig into other information such as normality of the residuals and check for autocorrelation etc. You may also want to evaluation the parameters as well. Those are valid statistical modelling questions.
Machine Learning focus is on prediction. You will not find these information with the scikit-learn package. Do take note of this key difference between statistics and machine learning.
In [11]:
x_test = np.linspace(0, 1)
x_test
Out[11]:
In [12]:
y_pred = model.predict(x_test.reshape(-1,1))
In [13]:
plt.scatter(X, y)
plt.plot(x_test, y_pred);
plt.show()